Background, what’s out there (visualization tools,) why this is useful (because there are not that many detailed examples showing the code, talk about your experience in Sunbelt “what’s the format of the data”, look for papers talking about computing literacy) and our goal (start to finish network visualization: load the data, process it a little bit, and plot it).
Talk about the different aspects about network viz the user needs to consider: layout, vertex size, vertex colour, vertex shape, edges, edges width, etc. Talk about the different components and how can we use them (to represent what, for example.) The size of the network, and type of the network (egocentric, small, large, bipartite, etc.)
In terms of the layouts, what are the things we need to consider (we can mention R packages that implement layouts in R).
Network visualization has many aspects that need to be taken into consideration to make the visualization effective in getting the necessary information across. One aspect that to consider is the layout of the data.
Vertex Size: Another aspect to consider is the size of the vertices.
Vertex color: The color of the vertex is also an important consideration. Color can make make objects look more visually appealing, but it can also be useful in differentiation in objects or levels of an object (Ognyanova, 2019). It can also assist in visualizing grouping and pattern or cluster detection (Tyner, 2017)
Vertex shape:
Edges:
Edge width:
First, the data needs to be pulled in. It can be found at (INSERT LINK HERE). After we pull it in, let’s get a glimpse of what the data looks like.
# attaching packages
library(igraph)
library(data.table)
library(devtools)
install_github("USCCANA/netplot")
library(netplot)
# loading and cleaning data
students <- fread("./data/middle_school/pone.0153690.s001.csv")
interactions <- fread("./data/middle_school/pone.0153690.s003.csv")
print(students)
## id grade gender unique lunch initialsNum
## 1: 2003 7 0 0 1 386
## 2: 2004 8 1 1 1 402
## 3: 2006 7 1 1 2 288
## 4: 2008 8 0 1 1 199
## 5: 2009 7 1 0 1 147
## ---
## 674: NA 8 0 0 99 171
## 675: NA 8 0 1 99 270
## 676: NA 8 0 1 99 327
## 677: NA 99 1 0 99 378
## 678: NA 7 1 0 99 277
print(interactions)
## id contactGender contactGrade contactId ClassPeriod contactInitialNum
## 1: 2004 1 8 3127 4 323
## 2: 2004 0 8 2620 1 335
## 3: 2004 1 8 99 1 401
## 4: 2004 1 8 99 9 401
## 5: 2004 1 8 99 9 401
## ---
## 10777: 3448 1 7 99 4 79
## 10778: 3448 1 7 99 2 17
## 10779: 3448 1 7 99 4 17
## 10780: 3448 1 7 3439 3 155
## 10781: 3448 1 7 99 3 294
In order to use the data, we need to remove all of the ’N/A’s and miscoding in the datasets. Also, we see a large number of students who only have interactions with themselves (they do not interact with anyone else through the day), so these “isolates” need to be removed in order for the graph to be more easily read.
# filtering out 'N/A's in the 'students' data frame
students <- students[!is.na(id)]
# filtering down to gender being "0" or "1"
students <- students[gender %in% c("0", "1")]
# filter out 'N/A's in 'id' and 'contactId'
interactions <- interactions[!is.na(id) & !is.na(contactId)]
# Which connections are not OK?
ids <- sort(unique(students$id))
# narrowed our data from 10781 to 5150
interactions <- interactions[(id %in% ids) & (contactId %in% ids)]
source(file = "./misc/color_nodes_function.R")
After, the two datasets need to be combined together.
## Creating matrix from datasets
net <- graph_from_data_frame(
d = interactions[, .(id, contactId)],
directed = FALSE, vertices = as.data.frame(students)
)
## Getting only connected individuals
net_with_no_isolates <- induced_subgraph(net, which(degree(net) > 0))
Finally, we plot it, effectively showing this network graph.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates
)
Here, we are taking the data set and the plot, letting us customize a number of aspects of the graph. First, in order to work with the “color_nodes” function, we need to make “grade” a factor instead of being numeric. Also, we identify the colors we would like the nodes to be.
## adjust 'grade' to factor
V(net_with_no_isolates)$grade <- as.factor(V(net_with_no_isolates)$grade)
# plotting connections among grades ####
set.seed(3)
a_colors <- color_nodes(net_with_no_isolates,"grade", c("gray40","red3"))
attr(a_colors, "map")
## 7 8
## "#666666" "#CD0000"
Now, we are able to create a plot of the data. This is the same data that we used to create the plot above, but now adjustments to the nodes will be made.
Color the vertices (‘vertex.color’) according to the grade the student is in (with 7th graders being gray and 8th graders being red).
Adjust the shape of the vertices (‘vertex.nsides’). If the student is a 7th grader, the vertices will be a circle, but if they are not, the vertices will be a triangle.
Adjust size of vertices (‘vertex.size.range’).
Remove the labels of the nodes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", c("gray40","red3")),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 10, 3),
vertex.size.range = c(0.015, 0.020),
vertex.label=NULL)
print(grades)
This looks good, but lets alternate these parameters we just gave to make things have a different look.
Change vertex.colors to be tied to a color palette.
Adjust vertex.nsides to make 7th graders be an octagon and 8th graders be a hexagon.
Adjust vertex.size.range, making each vertex smaller.
Add and adjust labels of vertices with functions vertex.label.[specific_function]
vertex.label.fontsize adjust the font size
vertex.label.show adjusts proportion of labels to keep.
Adjust vertex.frame.color to give an outline of each vertex.
library(igraph)
library(RColorBrewer)
# Create a color palette using RColorBrewer
palette <- brewer.pal(3, "Set1") # Change the number and palette name as needed
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", palette),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 8, 6),
vertex.size.range = c(0.01, 0.011),
vertex.label.fontsize = 10,
vertex.label.show = .25,
vertex.frame.color = "black")
print(grades)
Now that we have explored a bit about vertices, let’s dive into options related to edges.
Change edge.width.range to make the size of the edges wider or thinner.
Change edge.arrow.size to add arrows to the graph.
Change edge.color to blue.
Change edge.color.alpha to adjust transparency.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(.25,1),
edge.arrow.size = 100,
edge.color = "dodgerblue4",
edge.color.alpha = .33)
print(grades)
Now, let’s adjust everything again, showing some of the things that netplot can do with edges.
Adjust edge.color so that edges correspond to vertices on a gradient.
Adjust edge.curvature to make edges a straight line.
Adjust edge.line.lty to make edges long dashes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(1,1),
vertex.color = color_nodes(net_with_no_isolates, "grade", c("blue","red3")),
edge.color = ~ego(alpha = 0.5) + alter(alpha = 0.5),
edge.curvature = 0,
edge.line.lty = 5)
print(grades)
Using the same plot that we originally created, we can also adjust some of the aspects outside of vertices and edges.
Adjust bg.col to make background color slate gray.
Adjust sample.edges to select a proportion of the edges.
Adjust skip.vertex to not plot vertices at all.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
bg.col = "slategray1",
#skip.vertex = TRUE,
sample.edges = .5)
We can adjust things to get a different outcome.
Adjust skip.edges to remove edges altogether.
Adjust bg.col to misty rose.
Adjust zero.margins to true.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
skip.edges = TRUE,
bg.col = "mistyrose",
zero.margins = TRUE
)